Quantitative Seminar - 01/16/2025
Traditional educational research often fixates on average academic achievement.
Average performance and variability convey distinct information.
We adapt Mixed-Effects Location Scale Model (MELSM) incorporating a spike and slab prior into the scale component to select or shrink random effects.
Based on Bayes factors, we can decide on whether a school is (in-)consistent in its academic achievement.
Evidence for retaining the random effect is evidence of unusual variability.
Assumes a fixed within-school variance, potentially masking important differences in variability:
MELSM allows for the simultaneous estimation of a model for the means (location) and a model for the residual variance (scale).
Both sub-models are conceptualized as mixed-effect models.
\[\begin{equation} \textbf{v}_j= \begin{bmatrix} u_{0j} \\ t_{0j} \end{bmatrix} \sim \mathcal{N} \begin{pmatrix} \boldsymbol{0}= \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \boldsymbol{\Sigma}= \begin{bmatrix} \tau^2_{u_{0j}} & \tau_{u_{0j}t_{0j}} \\ \tau_{u_{0j}t_{0j}} & \tau^2_{t_{0j}} \end{bmatrix} \end{pmatrix} \end{equation}\]
Accounts for possible correlations among location and scale effects.
Allows the inclusion of specific predictors in both sub-models.
We incorporate the spike-and-slab prior as a method of variable selection of random effects in the scale model.
The model is allowed to switch between two assumptions:
\[\begin{equation} \color{lightgray}{ \textbf{v}= \begin{bmatrix} u_0 \\ t_0 \end{bmatrix} \sim \mathcal{N}} \begin{pmatrix} \color{lightgray}{ \boldsymbol{0}= \begin{bmatrix} 0 \\ 0 \end{bmatrix},} \boldsymbol{\Sigma}= \begin{bmatrix} \tau^2_{u_0} & \tau_{u_0t_0} \\ \tau_{u_0t_0} & \tau^2_{t_0} \end{bmatrix} \end{pmatrix} \end{equation}\]
\[\begin{equation} \label{eq:cholesky_approach} \textbf{L} = \begin{pmatrix} 1 & 0 \\ \rho_{u_0t_0} & \sqrt{1 - \rho_{u_0t_0}^2} \end{pmatrix} \end{equation}\]
If we multiply \(\textbf{L}\) by the random effect standard deviations, \(\boldsymbol{\tau}\), and scale it with a standard normally distributed \(\boldsymbol{z}\), we obtain \(\textbf{v}\):
\[\begin{equation} \textbf{v} = \boldsymbol{\tau}\textbf{L}\boldsymbol{z} \end{equation}\]
The Cholesky decomposition allows expressing the random effects in terms of the standard deviations and correlations.
\[\begin{equation} \begin{aligned} u_{0j} &= \tau_{u_0}z_{ju_0}\\ t_{0j} &= \tau_{t_0}\left( \rho_{u_0t_0}z_{ju_0} + z_{jt_0}\sqrt{1 - \rho_{u_0t_0}^2} \right)\color{red}{\delta_{jt_0}} \end{aligned} \end{equation}\]
\[\begin{equation} t_{0j} = \tau_{t_0}\left( \rho_{u_0t_0}z_{ju_0} + z_{jt_0}\sqrt{1 - \rho_{u_0t_0}^2} \right)\color{red}{\delta_{jt_0}} \end{equation}\]
Each element in \(\boldsymbol{\delta}_j\) takes integers \(\in \{0,1\}\) and follows a \(\delta_{jk} \sim \text{Bernoulli}(\pi)\) distribution.
When a 0 is sampled, the portion after the fixed effect drops out of the equation.
\[\begin{equation} \label{eq:mm_delta} \sigma_{\varepsilon_{ij}} = \begin{cases} \exp(\eta_0 + 0), & \text{if }\delta_{jt_0} = 0 , \\ \exp(\eta_0 + t_{0j}), & \text{if }\delta_{jt_0} = 1 \end{cases} \end{equation}\]
Throughout the MCMC sampling process \(\delta\) switches between the spike and slab.
If \(\delta= 0\), the density “spikes” at the zero point mass;
If \(\delta= 1\), the standard normal prior, \(z_{jk}\), is retained and scaled by \(\tau_k\), introducing the “slab”.
\[\begin{align} \label{eq:pip_theorical} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{Pr(\textbf{Y} | \delta_{jk} = 1)Pr(\delta_{jk} = 1)}{Pr(\textbf{Y})} \end{align}\]
The PIP is estimated by the proportion of MCMC samples where \(\delta_{jk} = 1\):
\[\begin{align} \label{eq:pip} Pr(\delta_{jk} = 1 | \textbf{Y}) = \frac{1}{S} \sum_{s = 1}^S \delta_{jks} \end{align}\]
where \(S\) is the total number of posterior samples.
If there is evidence for zero variance in the scale random effects, the model reduces to the MLM assumption:
\[\varepsilon_{ij}\sim\mathcal{N}(0, \sigma_\varepsilon)\]
If not, the MELSM assumption of variance heterogeneity is retained:
\[\varepsilon_{ij}\sim\mathcal{N}(0, \sigma_{\varepsilon_{ij}})\]
The PIP gives us a probabilistic measure and does not perform automatic variable selection. We estimate the strength of evidence through Bayes factors:
\[\begin{align} \label{eq:bf_pip} BF_{10j} = \frac{Pr(\delta_{jk} = 1 | \textbf{Y}) }{1 - Pr(\delta_{jk} = 1 | \textbf{Y}) } \end{align}\]
A BF\(_{10}\) > 3 corresponds to a PIP > 0.75 when the prior probability of \(\pi\) is 0.5.
We are three times more likely to include this random effect.
We use a subset of data from the 2021 Brazilian Evaluation System of Elementary Education (Saeb) test.
It focuses on math scores from 11th and 12th-grade students across 160 randomly selected schools, encompassing a total of 11,386 students.
The analysis compares three SS-MELSM models with varying levels of complexity:
The model was fitted using ivd package in R (Rast & Carmo, 2024).
All models were fitted with six chains of 3,000 iterations and 12,000 warm-up samples.
We computed the estimation efficiency using \(\hat{R}\) and the effective sample size (ESS).
The models were compared for predictive accuracy using PSIS-LOO cross-validation.
Model 1 identified eight schools with PIPs exceeding 0.75, suggesting notable deviations from the average within-school variance.
By incorporating SES covariates, Model 2 significantly outperformed Model 1 in terms of predictive accuracy, \(\Delta\widehat{\text{elpd}}_{\text{loo}}= -43.6 (10.5)\) .
Model 3 was practically indistinguishable from Model 2; the inclusion of a random slope for the student-level SES did not improve the model’s predictive accuracy, \(\Delta\widehat{\text{elpd}}_{\text{loo}}= -1.3 (0.6)\).
The SS-MELSM helps identifying schools deviating from the norm in terms of within-school variability.
The spike-and-slab prior accounts for uncertainty in including random effects.
Identifying variability can guide resource allocation or teaching interventions.
Currently, the SS-MELSM demands significant computational resources, especially with bigger datasets or more complex models.
It is still not clear how model performance is affected by the choice of hyperparameters.
Further development could explore the method’s performance in longitudinal data settings.
Read the preprint at
Beyond Averages with MELSM and Spike-and-Slab